SemanticScuttle - klotz.me » klotz: clustering+machine learning

klotz: clustering* + machine learning*

OpenAI Embeddings and Clustering for Survey Analysis — A How-To Guide

A guide on how to use OpenAI embeddings and clustering techniques to analyze survey data and extract meaningful topics and actionable insights from the responses.

The process involves transforming textual survey responses into embeddings, grouping similar responses through clustering, and then identifying key themes or topics to aid in business improvement.

2024-10-26 Tags: embedding, clustering, survey analysis, data science, visualization, k-means, tsne by klotz

ASCVIT V1: Automatic Statistical Calculation, Visualization, and Interpretation Tool

ASCVIT V1 aims to make data analysis easier by automating statistical calculations, visualizations, and interpretations.

Includes descriptive statistics, hypothesis tests, regression, time series analysis, clustering, and LLM-powered data interpretation.

Accepts CSV or Excel files. Provides a data overview including summary statistics, variable types, and data points.
Histograms, boxplots, pairplots, correlation matrices.
t-tests, ANOVA, chi-square test.
Linear, logistic, and multivariate regression.
Time series analysis.
k-means, hierarchical clustering, DBSCAN.

Integrates with an LLM (large language model) via Ollama for automated interpretation of statistical results.

2024-09-17 Tags: foss, ascvit, statistical analysis, data visualization, llm, python, streamlit, machine learning, statistics, regression, time series, clustering, eda by klotz

HDBSCAN: The Supercharged Version of DBSCAN — An Algorithmic Deep Dive

This article provides a beginner-friendly introduction to HDBSCAN, a powerful hierarchical clustering algorithm that extends the capabilities of DBSCAN by handling varying densities more effectively. It compares HDBSCAN to DBSCAN and KMeans, highlighting the advantages of HDBSCAN in handling clusters of different shapes and sizes.

2024-09-14 Tags: hdbscan, dbscan, clustering, machine learning, data science, hierarchical clustering, density-based clustering by klotz

A Guide to Clustering Algorithms

An overview of clustering algorithms, including centroid-based (K-Means, K-Means++), density-based (DBSCAN), hierarchical, and distribution-based clustering. The article explains how each type works, its pros and cons, provides code examples, and discusses use cases.

2024-09-06 Tags: clustering, unsupervised learning, machine learning, data science, python, k-means, k-means++, dbscan, hierarchical clustering, distribution based clustering by klotz

DBSCAN, Explained in 5 Minutes

A simple and intuitive explanation of DBSCAN (Density-Based Spatial Clustering of Applications with Noise), a clustering algorithm that can identify outliers, extract new features, compress data, and perform novelty detection. The article provides a fast implementation of DBSCAN in Python.

2024-08-25 Tags: dbscan, clustering, machine learning, python, density, spatial by klotz

Introduction to Interpretable Clustering

This article introduces interpretable clustering, a field that aims to provide insights into the characteristics of clusters formed by clustering algorithms. It discusses the limitations of traditional clustering methods and highlights the benefits of interpretable clustering in understanding data patterns.

2024-08-02 Tags: interpretable clustering, clustering, explainavility, xai, machine learning, data analysis, data science by klotz

Why Clustering Fails

Discusses reasons why clustering in data science might not produce desired results and how to address these issues.

2024-07-06 Tags: clustering, data science, unsupervised, machine learning, hdbscan by klotz

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

This article discusses a method for automatically curating high-quality datasets for self-supervised pre-training of machine learning systems. The method involves successive and hierarchical applications of k-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. The experiments on three different data domains show that features trained on the automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data.

2024-06-01 Tags: self-supervised learning, clustering, machine learning, k-means, feature training, llm by klotz

Mastering Customer Segmentation with LLM

Unlock advanced customer segmentation techniques using LLMs, and improve your clustering models with advanced techniques